High-Level Synthesis for Nested Loop Kernels with Non-Uniform Dependencies

نویسندگان

  • Akihiro Suda
  • Hideki Takase
  • Kazuyoshi Takagi
  • Naofumi Takagi
چکیده

In high-level synthesis, parallelization for nested loop kernels has been hard due to their complex data dependencies, especially non-uniform dependencies. In this paper, we propose a new method to synthesize a parallelized circuit from such kernels using polyhedral optimization, which has been vigorously studied in the software field. The key point of our contribution is a buffering method for parallel RAM accesses. The experimental result shows that the parallelized circuit with 8 PEs is 5.73 times faster than the sequential one.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Hardware Synthesis of Nested Loops Using UET Grids and VHDL

This paper considers the automatic synthesis of systolic architectures from nested loop algorithmic specifications. The high level input is given in the form of uniform dependence loops with unit dependencies and the target architecture is a multidimensional systolic array with unbounded number of cells. A complete methodology for the hardware synthesis of the resulting architecture, based on V...

متن کامل

Extracting data flow information for parallelizing FORTRAN nested loop kernels

Thesis Abstract Currently available parallelizing FORTRAN compilers expend a large amount of eeort in determining data independent statements in a program such that these statements can be scheduled in parallel without need for synchronisation. This thesis hypothesises that it is just as important to derive exact data ow information about the data dependencies where they exist. We focus on the ...

متن کامل

Automatic Parallelization of Non-uniform Dependences

This report summarizes our current experiences with Automatic Program Parallelization tools for converting sequential Fortran code for use on a multiprocessor computer. A number of such tools were evaluated, including Parafrase, Adaptor, PAT, Petit and the SUIF compiler package. We evaluated the suitability of such tools for parallelizing Computational Fluid Dynamics code supplied by the Army R...

متن کامل

An Optimized Three Region Partitioning Technique to Maximize Parallelism of Nested Loops With Non-uniform Dependences

There are many methods for nested loop partitioning exist; however, most of them perform poorly when they partition loops with non-uniform dependences. This paper proposes a generalized and optimized loop partitioning mechanism which can exploit parallelism in nested loops with non-uniform dependences. Our approach based on the region partitioning technique divides the loop into variable size p...

متن کامل

EFFICIENT LOOP SCHEDULING AND PIPELINING FOR APPLICATIONS WITH NON-UNIFORM LOOPSy

Using parallel processing systems to compute scientific applications is one of the most common solutions for achieving more efficient computing performance. In some applications such as fluid mechanics, structural analysis, solid state simulations, the dependencies across iterations (loop-carried dependencies) of the computation of array elements may be constants (uniform) or functions of array...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013